-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve audio splitting in dataset generation #419
base: alltalkbeta
Are you sure you want to change the base?
Conversation
@Yohrog will have to look at this tomorrow. Its both late here and Im a little caught up dealing with people on other support bits! Thanks for sending it over though, will get back to you later. |
@erew123 No worries. I mean there's no rush with this anyway, and as I said, there's still some changes I'm unhappy with. Take your time and I'll update it over time. Once I'm happy I'll mark it as ready. No need to review it before that if you don't find the time :) |
@erew123 A quick question regarding the final saved files. During dataset generation, just before we save the snippets, we separate out the sentences and trim them, resulting in a lot of files that are shorter than two seconds. Is that sentence splitting and trimming of the files really necessary? Because it results in a lot of lost training data and in the functions before that we go the extra mile to try and EXTEND those segments past two seconds. It seems like we're doing and undoing the same thing. Trimming happens here: Line 1764 in f16f6b9
|
Hi @Yohrog Sorry its taken a while to get back to you, just busy with bug fixing, support tickets etc. Every time I tried to make a moment to reply there would be another email/bug/something going on. So the idea was that if something got chopped up too small, you would whack it back together with something else, but then of course you may need to find the boundaries/edges of the newly merged audio, which I guess is what you are getting at, that it can result in chopping away data there. But that was the principle of it. So I guess we would say that the original code was:
And I think your proposed code is (I don't know where you are at now with it of course):
And you're storing extra information about the sentence boundaries throughout the whole process? That's my best guess/rough take at what you're proposing Sorry if I've got that wrong, I think my head is spinning with sorts of code after the last 16 hours :/ |
@Yohrog Just to be double double sure of things, I have been through all the coqui training scripts with 2x AI's (too much for me to comprehend and pull together in my head). TLDR 1: 3 seconds to 11 seconds is a good audio length clip and will pass through ALL the Coqui scripts, Huggingface scripts etc. As such, 3 second minimum to 11 second maximum would appear to be a very good spot to aim for with audio clip size. Here are some snippets of of AI Reponses: Based on the provided files and configurations, here’s the detailed analysis and conclusion regarding the minimum and maximum lengths of audio clips for training and how the model handles clips longer than its defined limits: Minimum and Maximum Audio Clip Length for Training
Handling Audio Longer Than 11.6 Seconds
Epoch-Wise Processing of Longer Audio
Final Constraints
Max audio length will over-ride with this setting.....obviously at the expense of memory. Finetuning is set at the 11.6 seconds as that is the Coqui suggested default... Re, will it skip anything over 11.6 seconds OR use only the first 11.6 seconds of a longer file: When I say, "Audio longer than ~11.6 seconds is excluded entirely," it means that the entire file would likely be skipped during training. This happens because the dataset preprocessing pipeline ( Here’s why:
How to Test or Modify This Behavior
By default, the current implementation appears to skip files entirely if they exceed the length cap. Modifications would be required to enable partial usage of longer files. Let me know if you’d like help pinpointing where to make these changes! |
Hi @erew123, I'm back on it now and will finish it today (and test for bugs on my end).
If you'd like anything different let me know. |
Hi @Yohrog Thanks for your reply and I completely understand! We all have life to get on with and I certainly have my own fair share of life going on! Thanks so much though, what you have managed to achieve sounds awesome and I look forward to testing it! And of course, no rush! I've got plenty to be on with myself, but I will test it whenever you send it over! The only 1x thing I did for someone in the last few days was add "hi" hindi as an option, as the 2.0.3 model supports hindi.....but whisper didn't appear to work! Not sure if you want to push "hi" in your code... Thats my conversation with them and this is Whisper saying it supports hindi https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L28 And just as I type this.... it hits me......I bet its silero that doesn't support certain languages!!... damn haha https://github.com/snakers4/silero-models?tab=readme-ov-file#further-reading I guess I will have to put a note in the help and an auto disable Silero if its not As I say, no panic! Enjoy your weekend and it gets here when it gets here :) Thanks again! |
Hi @Yohrog Sorry its taken me a day or so to get back to you! Thanks for working on this, I know how much of a pain it is to be having to make a code change, run a dataset look at it, go back to the code and repeat etc. Trust me, I'm building the RVC training at the moment and thats not been a happy time hah. Anyway, I used your build from the last update: Unfortunately, that threw a bug. I'm guessing maybe you uploaded an in-progress version? I did re-download the file, just to be double sure I hadn't downloaded the wrong one.
Happy to take a look if you too busy. Let me know. Thanks |
@erew123 Yeah, it started throwing more bugs than I would've liked and I'm still weeding them out. Thanks for testing it, I appreciate the stack trace! |
@Yohrog I have to travel for 5-8 days anyway, so wouldn't be able to test etc, so as always, no rush! I wasnt sure with your last upload if I should test or not, but as I have to travel, thought Id at least try it and let you know. |
This still has a lot of things that I changed during testing and might revert. These are the settings I used to get it to work locally.
Feel free to comment on any changes and I'll try to explain them.